Analitica Descriptiva#

Importacion de paquetes#

import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.tree import DecisionTreeClassifier
import os
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
import plotly.graph_objs as go
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.formula.api import logit
import statsmodels.api as sm
from scipy import stats
import psycopg2

Carga de datos#

Empezaremos el proceso de exploratorio de los datos con la carga de los mismos:

conn = psycopg2.connect(
    dbname="neondb",
    user="neondb_owner",
    password="YexCUXob4Oy7",
    host="ep-little-feather-a5rorog4.us-east-2.aws.neon.tech",
    port="5432"
)
cursor = conn.cursor()

cursor.execute("SELECT * FROM companies")

# Obtener los datos y guardarlos en un DataFrame
data2 = cursor.fetchall()
data = pd.DataFrame(data2)
data.columns=['company', 'sector', 'horizon_days', 'amount', 'date_BUY_fix',
       'date_SELL_fix', 'price_BUY', 'price_SELL', 'Volatility_Buy',
       'Volatility_sell', 'Sharpe_Ratio', 'expected_return_yearly',
       'inflation', 'nominal_return', 'investment', 'ESG_ranking', 'PE_ratio',
       'EPS_ratio', 'PS_ratio', 'PB_ratio', 'NetProfitMargin_ratio',
       'current_ratio', 'roa_ratio', 'roe_ratio']

Visualizaremos los datos:

data=pd.DataFrame(data)
data.head()
company sector horizon_days amount date_BUY_fix date_SELL_fix price_BUY price_SELL Volatility_Buy Volatility_sell ... investment ESG_ranking PE_ratio EPS_ratio PS_ratio PB_ratio NetProfitMargin_ratio current_ratio roa_ratio roe_ratio
0 HMC AUTO 7 500.0 2016-06-14 2016-06-21 25.600000 26.410000 0.274761 0.274900 ... GOOD 28.5 13.50 1.64 0.32 0.00 2.40 1.11 1.93 4.87
1 MSFT TECH 330 1500.0 2016-07-22 2017-06-16 52.356224 66.406387 0.279043 0.160044 ... GOOD 14.7 23.18 2.05 4.19 5.15 22.53 2.35 11.28 27.29
2 BAC BANK 15 25000.0 2015-01-28 2015-02-12 13.609470 14.925651 0.228531 0.232711 ... GOOD 26.3 42.16 0.38 1.77 0.69 4.62 0.94 0.21 1.88
3 KSS RETAIL 270 50000.0 2015-01-28 2015-10-26 47.355690 36.093388 0.221473 0.292144 ... BAD 12.9 10.87 4.26 0.50 1.55 4.56 1.95 6.03 14.78
4 AAPL TECH 600 10000.0 2015-05-21 2017-01-10 30.166021 28.201811 0.217025 0.227415 ... BAD 16.5 14.06 2.02 3.19 5.08 22.53 1.16 19.56 39.44

5 rows × 24 columns

Por medio de las funciones describe() y info() obtendremos una visión preliminar del tipo de datos que tenemos del DataFrame.

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 900 entries, 0 to 899
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   company                 900 non-null    object 
 1   sector                  900 non-null    object 
 2   horizon_days            900 non-null    int64  
 3   amount                  900 non-null    float64
 4   date_BUY_fix            900 non-null    object 
 5   date_SELL_fix           900 non-null    object 
 6   price_BUY               900 non-null    float64
 7   price_SELL              900 non-null    float64
 8   Volatility_Buy          900 non-null    float64
 9   Volatility_sell         900 non-null    float64
 10  Sharpe_Ratio            900 non-null    float64
 11  expected_return_yearly  900 non-null    float64
 12  inflation               900 non-null    float64
 13  nominal_return          900 non-null    float64
 14  investment              900 non-null    object 
 15  ESG_ranking             900 non-null    float64
 16  PE_ratio                900 non-null    float64
 17  EPS_ratio               900 non-null    float64
 18  PS_ratio                900 non-null    float64
 19  PB_ratio                900 non-null    float64
 20  NetProfitMargin_ratio   900 non-null    float64
 21  current_ratio           900 non-null    float64
 22  roa_ratio               900 non-null    float64
 23  roe_ratio               900 non-null    float64
dtypes: float64(18), int64(1), object(5)
memory usage: 168.9+ KB

Manipulación de la data#

Podemos observar como la variable “Unname: 0”, “company”, “date_BUY_fix” y “date_SELL_fix” no aportan significado al proceso predictivo al brindar información muy indiviual de cada dato. Por lo que, los eliminamos.

clean = ["company", "date_BUY_fix", "date_SELL_fix"]
data_1 = data.drop(clean, axis =1)
data_1.head()
sector horizon_days amount price_BUY price_SELL Volatility_Buy Volatility_sell Sharpe_Ratio expected_return_yearly inflation ... investment ESG_ranking PE_ratio EPS_ratio PS_ratio PB_ratio NetProfitMargin_ratio current_ratio roa_ratio roe_ratio
0 AUTO 7 500.0 25.600000 26.410000 0.274761 0.274900 0.274761 -0.000176 -0.2 ... GOOD 28.5 13.50 1.64 0.32 0.00 2.40 1.11 1.93 4.87
1 TECH 330 1500.0 52.356224 66.406387 0.279043 0.160044 0.279043 0.197311 -0.2 ... GOOD 14.7 23.18 2.05 4.19 5.15 22.53 2.35 11.28 27.29
2 BANK 15 25000.0 13.609470 14.925651 0.228531 0.232711 0.228531 0.007771 -0.5 ... GOOD 26.3 42.16 0.38 1.77 0.69 4.62 0.94 0.21 1.88
3 RETAIL 270 50000.0 47.355690 36.093388 0.221473 0.292144 0.221473 0.005071 -0.5 ... BAD 12.9 10.87 4.26 0.50 1.55 4.56 1.95 6.03 14.78
4 TECH 600 10000.0 30.166021 28.201811 0.217025 0.227415 0.217025 0.403993 -0.5 ... BAD 16.5 14.06 2.02 3.19 5.08 22.53 1.16 19.56 39.44

5 rows × 21 columns

Pie Plot#

conteo_categorias = data_1['investment'].value_counts()

# Crear un DataFrame con los datos del conteo de categorías
df_conteo = pd.DataFrame({'investment': conteo_categorias.index, 'investment': conteo_categorias.values})

# Crear el gráfico de pastel con Plotly
fig = px.pie(df_conteo, values='investment', names='investment', 
             title='Distribución de la Variable investment')

# Mostrar el gráfico
fig.show()

Histograma#

fig = px.histogram(data_1, x='investment', color='investment', 
                   labels={'investment': 'invesment', 'count': 'Frecuencia'},
                   title='Distribución de la Variable Investment')

# Actualizar diseño del gráfico
fig.update_layout(xaxis_title='Variable Categórica',
                  yaxis_title='Frecuencia')

# Mostrar el gráfico
fig.show()
# Filtrar datos para deslizamientos igual a 1 y 0
df_desl_1 = data_1[data_1['investment'] == 'GOOD']
df_desl_0 = data_1[data_1['investment'] == 'BAD']

# Crear histograma para deslizamientos igual a 1
histogram_1 = go.Histogram(x=df_desl_1["inflation"], opacity=0.5, name='investment = GOOD')

# Crear histograma para deslizamientos igual a 0
histogram_0 = go.Histogram(x=df_desl_0["inflation"], opacity=0.5, name='investment = BAD')

# Configurar el diseño del gráfico
layout = go.Layout(title='Histograma de inflation',
                   xaxis=dict(title='inflation'),
                   yaxis=dict(title='Frecuencia'),
                   barmode='overlay')

# Crear la figura y agregar los histogramas
fig = go.Figure(data=[histogram_1, histogram_0], layout=layout)

# Mostrar el gráfico
fig.show()
# Filtrar datos para deslizamientos igual a 1 y 0
df_desl_1 = data_1[data_1['investment'] == 'GOOD']
df_desl_0 = data_1[data_1['investment'] == 'BAD']

# Crear histograma para deslizamientos igual a 1
histogram_1 = go.Histogram(x=df_desl_1["nominal_return"], opacity=0.5, name='investment = GOOD')

# Crear histograma para deslizamientos igual a 0
histogram_0 = go.Histogram(x=df_desl_0["nominal_return"], opacity=0.5, name='investment = BAD')

# Configurar el diseño del gráfico
layout = go.Layout(title='Histograma de nominal_return',
                   xaxis=dict(title='inflation'),
                   yaxis=dict(title='Frecuencia'),
                   barmode='overlay')

# Crear la figura y agregar los histogramas
fig = go.Figure(data=[histogram_1, histogram_0], layout=layout)

# Mostrar el gráfico
fig.show()
# Filtrar datos para deslizamientos igual a 1 y 0
df_desl_1 = data_1[data_1['investment'] == 'GOOD']
df_desl_0 = data_1[data_1['investment'] == 'BAD']

# Crear histograma para deslizamientos igual a 1
histogram_1 = go.Histogram(x=df_desl_1["amount"], opacity=0.5, name='investment = GOOD')

# Crear histograma para deslizamientos igual a 0
histogram_0 = go.Histogram(x=df_desl_0["amount"], opacity=0.5, name='investment = BAD')

# Configurar el diseño del gráfico
layout = go.Layout(title='Histograma de nominal_return',
                   xaxis=dict(title='amount'),
                   yaxis=dict(title='Frecuencia'),
                   barmode='overlay')

# Crear la figura y agregar los histogramas
fig = go.Figure(data=[histogram_1, histogram_0], layout=layout)

# Mostrar el gráfico
fig.show()
# Filtrar datos para deslizamientos igual a 1 y 0
df_desl_1 = data_1[data_1['investment'] == 'GOOD']
df_desl_0 = data_1[data_1['investment'] == 'BAD']

# Crear histograma para deslizamientos igual a 1
histogram_1 = go.Histogram(x=df_desl_1["Sharpe_Ratio"], opacity=0.5, name='investment = GOOD')

# Crear histograma para deslizamientos igual a 0
histogram_0 = go.Histogram(x=df_desl_0["Sharpe_Ratio"], opacity=0.5, name='investment = BAD')

# Configurar el diseño del gráfico
layout = go.Layout(title='Histograma de Sharpe_Ratio',
                   xaxis=dict(title='inflSharpe_Ratioation'),
                   yaxis=dict(title='Frecuencia'),
                   barmode='overlay')

# Crear la figura y agregar los histogramas
fig = go.Figure(data=[histogram_1, histogram_0], layout=layout)

# Mostrar el gráfico
fig.show()
# Filtrar datos para deslizamientos igual a 1 y 0
df_desl_1 = data_1[data_1['investment'] == 'GOOD']
df_desl_0 = data_1[data_1['investment'] == 'BAD']

# Crear histograma para deslizamientos igual a 1
histogram_1 = go.Histogram(x=df_desl_1["roa_ratio"], opacity=0.5, name='investment = GOOD')

# Crear histograma para deslizamientos igual a 0
histogram_0 = go.Histogram(x=df_desl_0["roa_ratio"], opacity=0.5, name='investment = BAD')

# Configurar el diseño del gráfico
layout = go.Layout(title='Histograma de roa_ratio',
                   xaxis=dict(title='roa_ratio'),
                   yaxis=dict(title='Frecuencia'),
                   barmode='overlay')

# Crear la figura y agregar los histogramas
fig = go.Figure(data=[histogram_1, histogram_0], layout=layout)

# Mostrar el gráfico
fig.show()

Podemos evidenciar como las variables se distribuyen y como esto se relaciona con la categoría investment.

Diagrama de dispersióin#

# Crear el gráfico de dispersión con Plotly
fig = px.scatter(data_1, x='inflation', y='nominal_return', color= "investment",
                 title='Diagrama de Dispersión',
                 labels={'inflation': 'inflation', 'nominal_return': 'nominal_return'})

# Mostrar el gráfico
fig.show()
# Crear el gráfico de dispersión con Plotly
fig = px.scatter(data_1, x='inflation', y='amount', color= "investment",
                 title='Diagrama de Dispersión',
                 labels={'inflation': 'inflation', 'amount': 'amount'})

# Mostrar el gráfico
fig.show()
# Crear el gráfico de dispersión con Plotly
fig = px.scatter(data_1, x='inflation', y='Sharpe_Ratio', color= "investment",
                 title='Diagrama de Dispersión',
                 labels={'inflation': 'inflation', 'Sharpe_Ratio': 'Sharpe_Ratio'})

# Mostrar el gráfico
fig.show()
# Crear el gráfico de dispersión con Plotly
fig = px.scatter(data_1, x='inflation', y='expected_return_yearly', color= "investment",
                 title='Diagrama de Dispersión',
                 labels={'inflation': 'inflation', 'expected_return_yearly': 'expected_return_yearly'})

# Mostrar el gráfico
fig.show()
fig = px.scatter_3d(data_1, x='inflation', y='nominal_return', z='expected_return_yearly', color='investment',
                    title='Gráfico de Dispersión',
                    labels={'inflation': 'inflation', 'nominal_return': 'nominal_return', 'expected_return_yearly': 'expected_return_yearly', 'investment': 'investment'})

# Mostrar el gráfico
fig.show()

Comparando con variable Respuesta#

Podemos observar como las variables se comportan según la categoría de investment que presenten. siendo así como “inflation” y “nominal_ return” cuentan con el comportamiento más particular.

for column in data_1.columns:
    fig = px.box(data_1, x= "investment", y=column, color = "investment")
    fig.show()
  • Es posible observar como se evidenciaba en el arbol de clasificación que los variables con mayor incidencia eran Inflation y nominal return.

  • Sector, horizon day, amount, price sell, price buy, volatility buy, volatility sell, sharpe ratio, expect return, ESG ranking, PE ratio, EPS ratio, PS ratio, NetProfitmargin, current ratio, roa ratio y roe ratio no presentan aparentemente una fuerte tendencia hacia una categoria. Es decir, visualmente, no parecen ser determinantes en clasificar a una inversión como GOOD o BAD.

fig = px.histogram(data, x='sector', color='investment')
fig.show()

Podemos visualizar que no hay presencia de un sector que presente una predominante tendencia de buena inversión o mala inversión. Y al ser la unica variable categorica, se decide no tenerla en consideración en el entrenamiento del modelo.

data_1= data_1.drop("sector", axis=1)

Correlaciones#

Usar correlación para verificar que datos se las variables independientes no cuenten con una relación lineal.

numeric_df = data_1.select_dtypes(include=['int', 'float'])


# Calcular matriz de correlación
matriz_correlacion = numeric_df.corr()
matriz_correlacion
horizon_days amount price_BUY price_SELL Volatility_Buy Volatility_sell Sharpe_Ratio expected_return_yearly inflation nominal_return ESG_ranking PE_ratio EPS_ratio PS_ratio PB_ratio NetProfitMargin_ratio current_ratio roa_ratio roe_ratio
horizon_days 1.000000 0.057634 -0.019103 0.041087 -0.000782 0.079032 -0.000782 0.557271 0.000982 0.314202 -0.056352 -0.030855 0.043956 0.058744 0.000010 0.049919 0.063964 0.059054 0.045002
amount 0.057634 1.000000 0.015068 0.019191 0.036155 -0.017119 0.036155 0.022659 0.024025 0.035272 0.041535 -0.025461 -0.005007 0.003251 0.026403 -0.072575 -0.033395 -0.089766 -0.098832
price_BUY -0.019103 0.015068 1.000000 0.975333 -0.074075 -0.054223 -0.074075 0.105182 0.091847 0.046071 0.060159 0.317998 0.679935 0.219801 0.367833 0.112021 0.161695 0.158972 0.004417
price_SELL 0.041087 0.019191 0.975333 1.000000 -0.071000 -0.062839 -0.071000 0.165045 0.083859 0.142205 0.058317 0.304729 0.698000 0.222112 0.350046 0.122658 0.172211 0.161950 0.002011
Volatility_Buy -0.000782 0.036155 -0.074075 -0.071000 1.000000 0.850632 1.000000 0.166545 0.050158 0.065940 -0.079210 0.033027 -0.225247 0.237593 0.230903 -0.156604 0.194057 -0.054318 -0.265809
Volatility_sell 0.079032 -0.017119 -0.054223 -0.062839 0.850632 1.000000 0.850632 0.171310 0.070344 -0.039453 -0.123275 0.035551 -0.211890 0.187969 0.176131 -0.175065 0.175390 -0.040506 -0.258205
Sharpe_Ratio -0.000782 0.036155 -0.074075 -0.071000 1.000000 0.850632 1.000000 0.166545 0.050158 0.065940 -0.079210 0.033027 -0.225247 0.237593 0.230903 -0.156604 0.194057 -0.054318 -0.265809
expected_return_yearly 0.557271 0.022659 0.105182 0.165045 0.166545 0.171310 0.166545 1.000000 0.022885 0.533747 -0.088662 0.085136 0.038070 0.391668 0.361000 0.199539 0.263973 0.243156 -0.021053
inflation 0.000982 0.024025 0.091847 0.083859 0.050158 0.070344 0.050158 0.022885 1.000000 0.003025 -0.004853 -0.007222 -0.008672 0.048510 0.116725 -0.034308 -0.009438 0.041586 0.038014
nominal_return 0.314202 0.035272 0.046071 0.142205 0.065940 -0.039453 0.065940 0.533747 0.003025 1.000000 -0.105205 0.025856 0.072892 0.118291 0.038219 0.152078 0.101896 0.134162 0.049182
ESG_ranking -0.056352 0.041535 0.060159 0.058317 -0.079210 -0.123275 -0.079210 -0.088662 -0.004853 -0.105205 1.000000 0.112499 -0.106382 0.164561 0.012402 -0.138176 0.118752 -0.429422 -0.469440
PE_ratio -0.030855 -0.025461 0.317998 0.304729 0.033027 0.035551 0.033027 0.085136 -0.007222 0.025856 0.112499 1.000000 -0.042664 0.117468 0.352464 -0.036344 0.054029 -0.029921 -0.064672
EPS_ratio 0.043956 -0.005007 0.679935 0.698000 -0.225247 -0.211890 -0.225247 0.038070 -0.008672 0.072892 -0.106382 -0.042664 1.000000 0.078800 -0.045096 0.329119 0.130749 0.311497 0.243385
PS_ratio 0.058744 0.003251 0.219801 0.222112 0.237593 0.187969 0.237593 0.391668 0.048510 0.118291 0.164561 0.117468 0.078800 1.000000 0.483868 0.522420 0.822391 0.446526 -0.029877
PB_ratio 0.000010 0.026403 0.367833 0.350046 0.230903 0.176131 0.230903 0.361000 0.116725 0.038219 0.012402 0.352464 -0.045096 0.483868 1.000000 -0.043134 0.182139 0.117530 -0.176910
NetProfitMargin_ratio 0.049919 -0.072575 0.112021 0.122658 -0.156604 -0.175065 -0.156604 0.199539 -0.034308 0.152078 -0.138176 -0.036344 0.329119 0.522420 -0.043134 1.000000 0.474356 0.733180 0.560262
current_ratio 0.063964 -0.033395 0.161695 0.172211 0.194057 0.175390 0.194057 0.263973 -0.009438 0.101896 0.118752 0.054029 0.130749 0.822391 0.182139 0.474356 1.000000 0.452459 0.111891
roa_ratio 0.059054 -0.089766 0.158972 0.161950 -0.054318 -0.040506 -0.054318 0.243156 0.041586 0.134162 -0.429422 -0.029921 0.311497 0.446526 0.117530 0.733180 0.452459 1.000000 0.710006
roe_ratio 0.045002 -0.098832 0.004417 0.002011 -0.265809 -0.258205 -0.265809 -0.021053 0.038014 0.049182 -0.469440 -0.064672 0.243385 -0.029877 -0.176910 0.560262 0.111891 0.710006 1.000000
heatmap = go.Heatmap(
    z=matriz_correlacion.values,  # valores de la correlación
    x=matriz_correlacion.columns,  # nombres de las columnas
    y=matriz_correlacion.columns,  # nombres de las filas
    colorscale='Viridis',  # paleta de colores
)

layout = go.Layout(title='Mapa de calor de correlación')

fig = go.Figure(data=[heatmap], layout=layout)

# Mostrar el mapa de calor
#pyo.plot(fig, filename='heatmap.html')
fig.show()

Exploración: Modelo Reg. Logistica#

Como hemos visto anteriormente, la mayor parte de las variables no parecen aportar a conocer la variable respuesta. Por lo que, en modo de exploración realizamos un modelo de regresion de logística para conocer un poco más los datos con los que tratamos.

modelo_logit = sm.Logit(data_1['investment'], data_1[['Volatility_sell', 'expected_return_yearly', 'inflation', 'nominal_return', 'price_SELL','price_BUY']])
resultado_modelo = modelo_logit.fit()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 modelo_logit = sm.Logit(data_1['investment'], data_1[['Volatility_sell', 'expected_return_yearly', 'inflation', 'nominal_return', 'price_SELL','price_BUY']])
      2 resultado_modelo = modelo_logit.fit()

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\discrete\discrete_model.py:475, in BinaryModel.__init__(self, endog, exog, offset, check_rank, **kwargs)
    472 def __init__(self, endog, exog, offset=None, check_rank=True, **kwargs):
    473     # unconditional check, requires no extra kwargs added by subclasses
    474     self._check_kwargs(kwargs)
--> 475     super().__init__(endog, exog, offset=offset, check_rank=check_rank,
    476                      **kwargs)
    477     if not issubclass(self.__class__, MultinomialModel):
    478         if not np.all((self.endog >= 0) & (self.endog <= 1)):

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\discrete\discrete_model.py:185, in DiscreteModel.__init__(self, endog, exog, check_rank, **kwargs)
    183 def __init__(self, endog, exog, check_rank=True, **kwargs):
    184     self._check_rank = check_rank
--> 185     super().__init__(endog, exog, **kwargs)
    186     self.raise_on_perfect_prediction = False  # keep for backwards compat
    187     self.k_extra = 0

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\base\model.py:270, in LikelihoodModel.__init__(self, endog, exog, **kwargs)
    269 def __init__(self, endog, exog=None, **kwargs):
--> 270     super().__init__(endog, exog, **kwargs)
    271     self.initialize()

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\base\model.py:95, in Model.__init__(self, endog, exog, **kwargs)
     93 missing = kwargs.pop('missing', 'none')
     94 hasconst = kwargs.pop('hasconst', None)
---> 95 self.data = self._handle_data(endog, exog, missing, hasconst,
     96                               **kwargs)
     97 self.k_constant = self.data.k_constant
     98 self.exog = self.data.exog

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\base\model.py:135, in Model._handle_data(self, endog, exog, missing, hasconst, **kwargs)
    134 def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
--> 135     data = handle_data(endog, exog, missing, hasconst, **kwargs)
    136     # kwargs arrays could have changed, easier to just attach here
    137     for key in kwargs:

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\base\data.py:675, in handle_data(endog, exog, missing, hasconst, **kwargs)
    672     exog = np.asarray(exog)
    674 klass = handle_data_class_factory(endog, exog)
--> 675 return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
    676              **kwargs)

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\base\data.py:84, in ModelData.__init__(self, endog, exog, missing, hasconst, **kwargs)
     82     self.orig_endog = endog
     83     self.orig_exog = exog
---> 84     self.endog, self.exog = self._convert_endog_exog(endog, exog)
     86 self.const_idx = None
     87 self.k_constant = 0

File ~\anaconda3\envs\env_pf_data_viz\lib\site-packages\statsmodels\base\data.py:509, in PandasData._convert_endog_exog(self, endog, exog)
    507 exog = exog if exog is None else np.asarray(exog)
    508 if endog.dtype == object or exog is not None and exog.dtype == object:
--> 509     raise ValueError("Pandas data cast to numpy dtype of object. "
    510                      "Check input data with np.asarray(data).")
    511 return super()._convert_endog_exog(endog, exog)

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
p_valores = resultado_modelo.pvalues
variables_significativas = p_valores[p_valores < 0.05]
variables_significativas
Volatility_sell           2.623637e-13
expected_return_yearly    1.624640e-02
inflation                 1.012144e-22
nominal_return            8.670997e-23
price_SELL                1.149075e-04
price_BUY                 1.378763e-04
dtype: float64
resultado_modelo.summary()
Logit Regression Results
Dep. Variable: investment No. Observations: 900
Model: Logit Df Residuals: 894
Method: MLE Df Model: 5
Date: Sun, 21 Apr 2024 Pseudo R-squ.: 0.6825
Time: 16:12:04 Log-Likelihood: -190.05
converged: True LL-Null: -598.63
Covariance Type: nonrobust LLR p-value: 2.246e-174
coef std err z P>|z| [0.025 0.975]
Volatility_sell -6.5713 0.899 -7.312 0.000 -8.333 -4.810
expected_return_yearly -4.4793 1.864 -2.403 0.016 -8.132 -0.826
inflation -7.1526 0.729 -9.811 0.000 -8.581 -5.724
nominal_return 24.9465 2.539 9.826 0.000 19.971 29.922
price_SELL -0.0258 0.007 -3.857 0.000 -0.039 -0.013
price_BUY 0.0279 0.007 3.812 0.000 0.014 0.042


Possibly complete quasi-separation: A fraction 0.38 of observations can be
perfectly predicted. This might indicate that there is complete
quasi-separation. In this case some parameters will not be identified.

Con el método anteriormente utilizado de exploración podemos confirmar nuevamente que: “inflation” y “nominal return” son las que mayor capacidad de aporte tienen. Seguido de Volatility_sell, expected_return_yearly, nominal_return e inflation.

Como conclusión del proceso exploratorio de datos, podríamos mencionar que la mayor parte de las variables presentes en el data frame no tienen una fuerte inclinación a hacia la variable respuesta categorica asociada. Como fue posible revisar durante los box plots y durante el arbol de regresión expuesto para el estudio de las variables, solamente algunas variables parecen ser determinantes para conocer si una inversión será buena o mala. Adicionalmente, por medio de un modelo de regresion hecho en modo didactivo y exploratorio, fue posible observa que la mayor parte de las variables no son significativas a nivel predictivo.